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Abstract. Pairwise ordered tree alignment are combinatorial objects that 
appear in RNA secondary structure comparison. However, the usual rep¬ 
resentation of tree alignments as supertrees is ambiguous, i.e. two distinct 
supertrees may induce identical sets of matches between identical pairs of 
trees. This ambiguity is uninformative, and detrimental to any probabilis¬ 
tic analysis. 

In this work, we consider tree alignments up to equivalence. Our first re¬ 
sult is a precise asymptotic enumeration of tree alignments, obtained from 
a context-free grammar by mean of basic analytic combinatorics. Our sec¬ 
ond result focuses on alignments between two given ordered trees S and 
T. By refining our grammar to align specific trees, we obtain a decom¬ 
position scheme for the space of alignments, and use it to design an ef¬ 
ficient dynamic programming algorithm for sampling alignments under 
the Gibbs-Boltzmann probability distribution. This generalizes existing 
tree alignment algorithms, and opens the door for a probabilistic analysis 
of the space of suboptimal RNA secondary structures alignments. 


1 Introduction 

Tree alignments are the natural analog of sequence alignments, and have been 
introduced by Jiang, Wang and Zhang || 9 l to model and quantify fhe similarify 
between two (orderec^ trees. Initially proposed as an alternative to tree-edit 
distance, the tree alignment model has proven more robust, allowing for fhe 
inclusion of complex local operafions || 2 |, and for being generalized to mul¬ 
tiple input trees || 8 l. Consequently, tree alignment has been used in a wide 
array of applicafive confexfs, especially RNA Bioinformafics || 7 j, where RNA 
secondary sfrucfures alignmenfs can be encoded by free alignmenfs. The min¬ 
imal cosf free alignmenf between two frees of size ni and 77,2, under classic 
inserfion/delefion/(mis)-mafch operafions, can be compufed using d3mamic 
programming (DP). The currenf besf algorifhms have a worsf-case fime and 
space complexify respecfively in 0(nin2(ni +712)^) and 0(nin2(ni -1-712)) lU al¬ 
gorifhms, and an average-case fime and space complexify (on uniformly drawn 
insfances) in 0(711712) || 6 l. 

In fhe confexf of sequence alignmenfs, fhe enumerafion of alignmenfs has 
been fhe objecf of much inferesf in Compufafional Biology 14112111 . Alignmenfs 

^ In this work, unless explicitly specified, all trees will be rooted and ordered. 





between two sequences over an alphabet S can be encoded as sequences over 
an extended alphabet Ea, representing insertions, deletions and (mis)matches 
(e.g. S = {a,b}, Ea = {(a, -), (-,&), (a, 6), (a, a), {b,a), {b,b)}). Many sequences 
over Ea are equivalent if one considers only (mis)matches of fhe alignmenfs, 
i.e. fhey align sequence of same lengfhs and induce fhe same sefs of mafched 
posifions {e.g. (a,6) and (—, 6), (a, —)). If is a nafural problem fo enu- 
merafe disfincf sequence alignmenfs for two sequences of cumulafed lengfh 
n M pp. 188]. Beyond purely fheorefical considerafions, fhe decomposifions 
infroduced for enumerafing disfincf sequence alignmenfs were adapfed info 
DP algorifhms, e.g. for probabilisfic alignmenf based on expecfafion maximiza- 
fion 131, or fo compufe Gibbs-Bolfzmann measures of reliabilify IIT3l . 

In fhe presenf work, we consider similar quesfions on tree alignments. We 
are firsf inferesfed in counfing disfincf free alignmenfs, i.e. enumerafing, up fo 
equivalence, ordered frees whose verfices are labeled in Ea (called supertrees 
from now). For frees, fhe nofion of equivalence of alignmenfs generalizes fhaf 
of sequence alignmenfs, i.e. two alignments are equivalent when they align the 
same pairs of trees, and induce the same sets of (mis)matched positions. Un¬ 
fortunately, contrasting with the case of sequence alignments, existing DP al¬ 
gorithms for computing an optimal tree alignment II9I2I11II carmot be easily 
adapted into enumeration schemes for tree alignments up to equivalence. This 
additional difficulty is due to the existence of ambiguities of different nature. 


Our main contribution is a grammar for (distinct) tree alignments, which 
provably generates a single representative for each equivalence class. We use 
the symbolic method JSl to obtain the generating function of tree alignments, 
and asymptotic equivalents for various statistics of interest can easily be de¬ 
rived, such as the average number of alignments over trees of total size n. Fi¬ 
nally, and, perhaps more importantly from an applied point of view, the gram¬ 
mar can be transformed into an unambiguous and complete DP algorithm for 
aligning two input trees. The resulting algorithm has the same asymptotic worst- 
case and average-case complexities, up to reasonable constants, as the current 
best - ambiguous - algorithm 119 121 . The main interest of such an algorithm is 
that it opens immediately the way to new applications for the tree alignment 
model, including a critical assessment of the reliability of optimal alignments, 
either obtained by counting co-optimal alignments, or by sampling suboptimal 
alignments according to a Gibbs-Boltzmann distribution (see BTOl for an exam¬ 
ple of this approach for the RNA folding problem). 


In Section 1^ we introduce the main definitions about trees, supertrees and 
tree alignments. In Section we provide a grammar that generates all tree 
alignments. In Section |0] we analyze this grammar from an enumerative point 
of view and give precise results on the number of alignments of fixed size. Fi¬ 
nally, in Section 4.2 we show how to transform the tree alignments grammar 
into a d 5 mamic programming algorithm to sample tree alignments between two 
specified trees. 





2 Definitions 


Trees and supertrees. Let S be an alphabet. A tree T on is a rooted plane tree 
whose vertices are labeled by elements of E. We denote by Vr the set of verfices 
of T. We remove a non-root vertex v from a free T by confracfing fhe edge between 
V and ifs parenf u, fhaf keeps ifs label. Removing fhe roof r of a free consisfs in 
creating a foresf composed of fhe subfrees roofed af fhe children of r. We denofe 
fhe operation of removing a vertex v from Thy T — v. 

We denofe by Ea the alphabet defined by Ea = {EVJ {—})^ — {(—, —)}. An 
element {x, y) £ A'o is an insertion (resp. deletion, match) iiy = — (resp. x = —, 
{x, y) £ E'^). A supertree A is a tree on Ea', a vertex of A is an insertion (resp. 
deletion, match) if ifs label is an inserf ion (resp. delefion, match). The size of a 
superfree A is fhe number of ifs insertions and deletions, plus twice the number 
of ifs mafches. A superforest is an ordered sequence of superfrees. 

Given a superfree A on E, we define fwo foresfs tti (A) and 712 (A) as follows: 
7ri(A) (resp. 7r2(A)) is obfained by (1) iteratively removing all insertion (resp. 
deletions) of A, in an arbitrary order, and (2) replacing the label {x, y) of each 
remaining verfex by x (resp. y). We refer fo Fig.j^for an illusfrafion. We exfend 
fhe nofafions tti and 712 on verfices: for a non-inserfion (resp. non-delefion) ver¬ 
fex V of A, we denofe by 7ri(7;) (resp. tt2{v)) fhe corresponding verfex in 7ri(A) 
(resp. 7r2(A)). A verfex x of 7ri(A) such fhaf 7rj"^(a;) is an insertion (resp. mafch) 
is said fo be inserfed (resp. mafched) in A. Similarly, a verfex y of 7r2(A) such 
fhaf TTf^{y) is a delefion (resp. mafch) is said fo be delefed (resp. mafched) in A. 

Tree alignments. As foresfs tti (A) and 772 (A) are embedded info fhe superfree A, 
fhe latter implicifly defines an alignment befween fhe foresfs 7ri(A) and 7r2(A), 
i.e. a sef of correspondences befween verfices of tti (A) and 7 T 2 (A), fhaf is consis- 
fenf wifh fhe sfrucfure of bofh foresfs ||9l. We refer fo Fig.j^for an illusfration. 



Fig.l. A supertree Ai with alphabet E = {A, C, G, U}, and the associated trees S — 
7ri(Ai) and T — 772 (Ai). The alignment of S and T defined by A is composed of two 
pairs of matched (A, A) and ([/, A), indicated by dashed arrows. 


We now turn to the central notion of equivalent alignments, i.e. alignmenfs of 
identical pairs of frees, fhaf confain exacfly fhe same sef of mafched verfices. 
Given a superfree A, representing an alignmenf befween fwo frees S = tti (A) 


and T = 712 {A), the set of matches of A is formed by the elements (a;, y) of Vs x Vr 
such that 7 Tf^{x) = {i.e. there exists a vertex v oi A such that 7Ti{v) = x 

and 7r2(f) = y). Two supertrees Ai and A 2 are equivalent if 7ri(d.i) = 7:i{A2), 
'^ 2 iAi) = 712 (^ 2 ), and the sets of matches of Ai and A 2 are identical (see Fig.|^ 
for an illustration). 



^3 



Fig. 2 . Two non-equivalent supertrees, representing two different tree alignments. How¬ 
ever, the supertree Ai from Fig.j^and the supertree A2 are equivalent. 


A tree alignment is then defined as an equivalence class over supertrees with 
respect to the above-defined equivalence relation, for which tti (A) and 7T2 (A) 
are trees. The notion of forest alignment is similarly defined when 7ri(A) and 
712 (A) are riot restricted to trees. Given a set S of tree (resp. forest) alignments, a 
set T of supertrees (resp. superforests) is said to be representative of§ if it contains 
exactly one supertree (resp. superforest) for each alignment {i.e. equivalence 
classes of supertrees and forests) in §. Tree alignments will now be the focus of 
our work. 


3 A grammar for tree alignments 


In this section, we describe a context-free grammar for a set A of supertrees that 
is representative of the set of all tree alignments. 

We first define some basic operations on supertrees and superforests: 

- The (ordered) concatenation of two (super)forests A and B is denoted by 
A o B. It creates a new superforest beginning by the supertrees of A, and 
ending by the supertrees of B. 

- Given two disjoint sets Ti and T 2 of supertrees or superforests, we denote 
by Ti 0 T 2 their (disjoint) union. 

- For any superforest A and a,b G E, InsRoot (A, a) (resp. DelRoot (A, b), 
MatchRoot (A, a, b)) denotes the supertree whose root is the vertex (a, —) 
(resp. (—, b), {a, b)) and whose children are the supertrees in A, ordered with 
the same order that they have in A. 


= V"’ © Tj © To © InsRoot (T/ o To) (1) 

T/= InsRoot (T/), T/= {empty superforest} © InsRoot (Tj) o Jj ( 2 ) 

To = InsRoot (To), To = {empty superforest} © InsRoot (To) o To ( 3 ) 

© InsRoot (VM) (4) 

= MatchRoot (3^110 0^0) © DelRoot ^To o o To j ( 5 ) 

V3f = T/ o V3f © o T/ © DelRoot (3f||D,„,0) o T/ (6) 

For every v, M, M' with v G {l|D, D} and M, M' G {0, —>•}; 


3f 




{empty superforest} 
T/ o 


0 


To O ^D,M,M' 

InsRoot ( 3 f||D, 0 ,«) ° ^M,M' 
DelRoot ( 3 {D,-e>, 0 ) o ^M,M' 


if (M, M') = ( 0 , 0 ) 

if 7 ^ D and if M 

if M' 7 ^ ^ 


(7) 


For every M, M' € {0, -H-, — >•} and i,j G {1, +}: 


3f 


M,M> 


— 3f||D,Q:(Ar),a(M') 


T/ 

ifM 

= 0 and M' - 

: —>• 


T/ 

if M 

= 0, 

M' = o- 

and j - 

= + 

Td 

ifM 

= —> 

and M' 

= 0 


Td 

ifM 


M' = 0 

and i = 

= + 

0 

otherwise 





( 8 ) 


where q( 0) = 0 and a{<r^) = Q!(~t) = —>■. 


Fig. 3. A confext-free grammar for A, a representative set of all tree alignments. 


- We naturally extend these operators to a set T of supertrees or superforests: 
InsRoot (T) = InsRoot {A, a), DelRoot (T) = DelRoot {A, a), 

j4.G .Z/ 

MatchRoot (T) = 0 MatchRoot {A, a, b). 

AG7,(a,b)GS'^ 

Our grammar is described in Fig. and illustrated in Fig. 

Theorem 1. The set of supertrees A generated by the grammar 0-@ is representative 
of the set of all tree alignments; i.e. A contains exactly one supertree for each equivalence 
class of supertrees. 

The key ingredient to prove Theoremj^stems from fhe following (semanfic) 
properties for fhe classes of superfrees and foresfs fhaf appear in fhe grammar: 





^ ■A_A A_A 

®aAa'^ ® a a a ^ 

I I 'H■\\D,0,^^ Wd,-!->,0 

.f(M.M') = (0.0) 

Fig. 4. A schematic illustration of the grammar for tree alignments. 



1. Supertrees in T/ (resp. T^) contain only insertion (resp. deletion) vertices. 

2. (resp. Jd) is the set of superforests formed by supertrees of T/ (resp. T^j). 

3. For p e {0, t}, is represenfafive of fhe sef of alignmenfs A wifh af leasf 
one mafch, such fhaf, if p =t, then the root of tti (A) is mafched. 

4. Vdf is represenfafive of fhe sef of foresf alignmenfs A wifh af leasf one 
mafch, such fhaf tt 2 {A) is a free. 

5. For S {l|D, D} and (M, M') S {0, O, — m,m' is represenfafive of 
fhe sef of superforesfs A such fhaf 

- if TTi (A) ^ 0 and v — D, fhen the first tree of tti (A) is mafched in A; 

- if M =—t, fhen fhe lasf free of tti (A) is mafched in A (so tti {A) ^ 0); 

- if M' =—t, fhen fhe lasf free of 7 T2(A) is mafched in A (so '!T 2 {A) ^ 0); 

- if M =<-t, fhen fhe firsf and lasf frees in 7ri(A) are mafched in A (so 
TTi (A) has af leasf fwo frees); 

- if M' =o, fhen fhe firsf and lasf frees in 7r2(d^) are mafched in A (so 
712(A) has af leasf fwo frees). 

6. For i, j s {1, +}^, ^mm' i® represenfative of superforesfs A! such fhaf 

- fhere exisfs a superforesf A such fhaf Ao A' & 

- if z = 1 (resp. +), 7ri(A) is a free (resp. a foresf wifh at least two trees); 

- if j = 1 (resp. +), 'n' 2 {A) is a free (resp. a foresf wifh af leasf fwo frees). 

These properfies can be verified recursively fhrough a fedious analysis of 
fhe grammar, and imply quife sfraighfforwardly fhaf A confains one and ex- 
acfly one superfree per equivalence class of superfrees. 
















Remark 1 For sequences alignments, a grammar generating a representative set of se¬ 
quence alignments can be easily adapted from the grammar generating all sequences 
over Ea, e.g. by preventing any occurrence to immediately precede an insertion. In 
the case of trees, the two-dimensional nature of the objects seems to forbid such a sim¬ 
ple characterization, and seem to intrinsically mandate intricate combinatorial con¬ 
structs/grammars. Note however, that our grammar, while complex, remains amenable 
to efficient computations (Section^. 

4 Applications 

4.1 Enumerating tree alignments 

For the sake of simplicity, we will restrict our attention to jZ'l = 1, i.e. the al¬ 
phabet is restricted to a single letter. The general case follows easily, and will be 
described in an exfended version of fhe paper. 

For a family T of superforesfs, we define a bivariafe ordinary generafing 
funcfion 

F{t,z)= Y. 

n>0, k>0 

where fn,k is fhe number of superforesfs in T of size n wifh k mafches. 

Using fhe symbolic method ||3, one classically translafes fhe specificafion de¬ 
scribed by Eqs. into a system of funcfional equafions relating the gen¬ 

erating functions of fhe sefs of superfrees and foresfs. To fhat purpose, classes 
of objecfs are replaced by fheir generafing funcfion, disjoint unions (resp. con¬ 
catenations) of two sefs of superfrees are replaced by addifions (resp. mulfipli- 
cations) of fheir generating functions, the addition of a roof franslafes info a 
mulfiplicafion by a monomial tz (resp. t) if fhe roof represenfs a mafch (resp. 
inserfion/delefion), and empfy superforesfs and sefs franslafe info 1 and 0 re¬ 
spectively The grammar is confexf-free, so fhe resulfing sysfem is algebraic and 
can be solved fo yield fhe following characferizafion resulf. 

Theorem 2. The generating functions T{t,z) and F{t,z) of tree and forest align¬ 
ments, whose size and number of matches are marked by t and z respectively, satisfy 

T{t,z)= +t-fz+F{t,z), (9) 

(tzCff+ 2t)F{t, z)"^ + -2tC{tf -l)F{t, z)+Citf = 0, (10) 

where C{t) = (1 — \/l — 4f)/2f is the generating function of Catalan numbers. 

Solving fhe quadrafic equation (T0| | leads to an explicit formula for FA (and 
hence TA), defails of which are omiffed due fo space consfrainfs. Nonefheless, 
fhese explicif expressions can be used fo compufe an asympfofic esfimafe using 
a transfer theorem 0 Cor. VI.1 p. 392]. 





Theorem 3. The number of tree alignments of size n is asymptotically equivalent to 
K X X 6 ”, where n = •\/2(3 — •\/3)/(24y^). 

Corollary 1 The average number of tree alignments for a random pair of trees of cu¬ 
mulated size n is n! x 1.5", where n! = ■\/2(3 — •\/3)/6. 

Similar techniques can be used to characterize the distribution of the num¬ 
ber of mafches in a random free alignment. A direct application of IS) Theorem 
IX.12 p. 676] indeed gives fhe following. 

Proposition 2 Let rrin be the random variable that counts the number of matches in 
a uniformly-drawn random tree alignment. The variable nin follows a Normal law of 
mean E(m„) ~ n/6 and variance V(m„) ~ n/6. 


4.2 Sampling alignments between two given trees 

We now consider two fixed trees S and T, and consider the task of sampling a 
free alignment A such that irfA) = S and (A) = T, with respect to the Gibbs- 
Boltzmann probability distribution. This can be used to assess the stability of 
a prediction. We refer fhe inferesfed reader to our introduction for examples of 
further motivation and possible applications. 

Preliminaries. Let 7s^t be the set of all superfrees A such fhaf 7ri(A) = S and 
712 (A) = T, and As.t he a represenfafive sef of 7s, t- In ofher words, As,t can 
be inferprefed as fhe sef of all alignmenfs befween S and T. For any superfree 
A S Tg j’, we define ifs edit score s(A) as fhe sum of fhe number of insertions, 
deletions and matches {x, y) such that x f 

For a given positive constant kB, the partition function Zs,t of As,t and fhe 
Gibbs-Boltzmann probability Pr(A) of an alignmenf A e As,t are defined as 

„-s(A)/ke 

Zs,T= Y. Pr(A) = —-. 

When kd fends fo 0, fhis disfribufion fends fo fhe uniform disfribufion over 
superfrees of minimum edif score, while, when kB fends fo -l-oo, if fends foward 
fhe uniform disfribufion over As,t- 

We consider fhe following problem: given two frees S and T, and a posi- 
five consfanf kB, design a sampling algorifhm for alignmenfs befween S and T 
under fhe Gibbs-Bolfzmann probabilify disfribufion. This problem generalizes 
fhe classic combinaforial opfimization problem of compufing a free alignmenf 
befween S and T having minimum edif score. 

® The present results can be trivially extended to any edit scoring system that is a posi¬ 
tive linear combination of the numbers of insertions, deletions and matches. 




To address this problem, we rely on d 3 mamic programming, by the ap¬ 
proach described, among others, in ITOl for RNA folding. We begin by adapfing 
fhe grammar infroduced in Secfion|^info a grammar for As,t, then detail how 
this grammar leads to an efficient sampling algorithm. 

A grammar for As,t- In order to guarantee that each supertree A indeed aligns 
two input trees S and T (namely 7ri(A) = S and 'K 2 {A) = T), we need to re¬ 
strict which rules in the grammar can be used, conditionally to which trees and 
forests are currently being generated. To that purpose, we introduce, for each 
sef § in fhe previous grammar, an indexed version §iu,v] which denofes fhe re- 
sfricfion of § fo alignmenfs befween u and v two foresfs in S and T. 

Slighfly abusing previous nofafions, we denofe by a{u) fhe free whose roof 
is a verfex a and whose (foresf of) children is u. Finally, for every free/foresf X, 
Ins(A) (resp. Del(A)) represenfs fhe superfree/superforesf obfained from X by 
inserfing (resp. delefing) each of ifs elemenfs. If X is empfy Ins(A) and Del(A) 
denofe fhe empfy superforesf. The grammar for As^t is described in Fig.|^ 

Theorem 4. Let S and T be non-empty trees. The set of supertrees As,t generated by 
grammar is representative ofTs.r ihe tree alignments between S and T. 

Applications to dynamic programming. The grammar defined by Equafions ([TT)- 
(TS) is a decomposifion scheme for fhe alignmenfs befween S and T. If can 
easily be fransformed info an algorifhm for compufing fhe parfifion function 
Zs.T- Indeed, Zs^t is simply a weighfed sum over all possible superfrees of 
As,t, which is a sef generafed by fhe grammar. Now consider fhe image of fhe 
grammar as a sef of numerical equafions, obfained by S5mfacfically replacing: 

- The operators (0, o) wifh (^, x) respectively; 

- The empfy sef 0 wifh 0; 

- Inserfed/Delefed frees/foresfs Ins(A') and Del(W) wifh 

- Mafch MatchRoot (1/ a, a) evenfs wifh V,Wa £ S and any expression V ; 

- Insertion InsRoot (V, a) evenfs, deletion DelRoot (to, a) evenfs, and mismafch 

MatchRoot (V, a, b) events with e X V,ya f b £ E and any V. 

Theorem 1^ immediately implies that the resulting set is a d 5 mamic program¬ 
ming scheme that computes Zs^t instead of As^t- 

Moreover, each non-terminal term of the modified grammar now contains 
the partition function of the set of superfrees associated to this non-terminal 
term in the set-theoretic grammar, e.g. a term V‘K[a{u) o X, &(?;)]. This informa¬ 
tion can then be used to define an algorithm to sample superfrees from As-r un¬ 
der the Gibbs-Boltzmarm distribution, following the recursive method for ran¬ 
dom generation IflSl . 

To do so, it suffices to reinterpret the grammar defined by Equations •EI)- 
(TS) as a branching process: each 0 operator is replaced by a branching operator 


A s,T =V^[S',T]©InsRoot(Ins(Xs)oDel(r),rs) (11) 

S = rs(Xs) 

V‘^[a(u), b{v)] = [«(«), b{v)] © InsRoot {V‘K[u, &(w)], a) (12) 

{ MatchRoot v], a, fe) 

0 DelRoot (Del(y) o Vl[a(M), c(w)] o Del(y'), 6) 

Yoc{w)oY' =v 


V'K[0,b{v)] = 0 
Ins(a(M)) o VJ{[X, b{v)] 


(14) 


VK[a{u) o X, 6(ii)] — < x'ox"=a(u)ox (15) 

\X'\>2 

V‘^[a(u), b(v)] o Ins(X) 

For every v, M, M' with v £ {IjD, D} and M, M' £ {0, -o-, —>}: 

flns(X) ii(M,M') = (0,0), 


[X, 0] — 


0 


'K to VI /Del(y) if(M,M') = (0,0), 


0 


otherwise, 

if (M, M') 
otherwise, 


(16) 

(17) 


[a(u) o X, b(v) o y] = 


[’Ins(a(M)) o [y, &(w) o y] if u ^ D and if M 

Del(fe(w)) o ©:d,m.m' [a(M) ° X, y] if M' / 

Y[a(u), b{v)\ o df||D,a(M,X),a(M',y) [^1 y] 

< 0 InsRoot (Jf||D, 0 ^<_,[u, y'], a) o df||o_a(M,x),a{M',y") [y> ^”1 

\Y'\>2 

0 DelRoot (fKD,^-^, 0 [X', w], 6) o M||D^Q:(M,x"),Q:(M',y)[y”, y] 

X'oX''^a(u)oX 
\X'\>2 


(18) 


where a(0, X) = 0 and q;(- 0 ', X) = a(— X) = 


0 if y = 0, 
—> ofherwise. 


Fig. 5. A grammar for As,t, a representative set of all tree alignments between two fixed 
trees S and T. 


that, instead of joining sets of supertrees into a larger set of supertrees, chooses 
one of the sets according to the weight of its partition function. For instance, 
assume we have a grammar rule U = V(BW: the sampling algorithm will select 
one of the sets V, W, with V being chosen with probability Zy l(Zy + Zw), and 
ly with probability Zwl{Zy + Zw), provided that Zy, Zw and Zx have been 
previously computed. Recursive calls will then result into a supertree, which is 
provably randomly generated under the Gibbs-Boltzmann distribution. 




Theorem 5. Let S and T be two trees of respective sizes ns and ut- The above-defined 
branching process adapted from grammar defines an algorithm that samples a 

supertree from As,t under the Gibbs-Boltzmann distribution. The worst-case time and 
space complexities of the algorithm are in 0{ns ut (ris +rir)^)/ while the average-case 
time and space complexities are in 0{ns nr)- 

The correctness of the algorithm immediately follows from Theorem Its 
complexities are identical to II9I6I since the structure of the DP scheme essen¬ 
tially remains the same; only the number of DP tables is increased (by a con¬ 
stant factor). This implies that our algorithm, while solving a much more gen¬ 
eral problem, retains the same asymptotic complexity (up to constants) than 
the current tree alignment algorithms that are limited to computing a single 
optimal tree alignment. 


5 Conclusion and discussion 

Following a classical line of research in string algorithms, we introduced the 
notion of equivalence for tree alignments, and described a context-free gram¬ 
mar for a representative set of all possible alignments. We also showed how 
this grammar can be used to derive asymptotic properties of alignments, and 
design an efficient d 5 mamic programming sampling algorithm for alignments 
between two given trees. 

From an applied point of view, our results allow to sample optimal, as 
well as suboptimal, tree alignments for a pair of given trees under the Gibbs- 
Boltzmann distribution; following the program outlined in IITOl , we are cur¬ 
rently using this algorithm to revisit the alignment of RNA structures. 

Our proposed grammar for tree alignments is more complex than the gram¬ 
mars used to generate a representative set of sequence alignments, although 
d 5 mamic programming for computing optimal sequences and trees alignments 
are very similar. This is due to the fact that it is particularly hard to charac¬ 
terize a representative set of tree alignments (see Remark]^. It thus remains 
an open problem to design a representative set of tree alignment that would be 
amenable to enumeration using a simpler grammar. However, it is important to 
remark that, despite its apparent complexity, our grammar leads to algorithms 
with an asymptotic complexity of the same order than existing optimization 
algorithms. 

From a theoretical point of view, we believe that tree alignments as de¬ 
fined in this work form an interesting combinatorial family whose properties 
deserve to be explored in depth. More generally, it would be interesting to 
characterize the conditions under which an instance-agnostic grammar, enu¬ 
merating a search space, could be adapted into a decomposition for a specific 
instance. Such a theory, at the confluence of enumerative combinatorics and 
algorithmic design, could provide another principled ways to design d 3 mamic- 
programming algorithms. 
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